Trend Analysis of Road Traffic Accidents of Turkey

Traffic Accident Death and Injury Data

Note

This document is a product of the final project for the STAT 570 lecture, focusing on data handling and visualization tools. It is essential to acknowledge that minor errors may be present, and the methods employed may not necessarily reflect the optimal approach related to the data set.

Levent Sarı - levent.sari@metu.edu.tr

Hüseyin Tan - huseyin.tan@metu.edu.tr

Introduction

In today’s world, traffic accidents emerge as a serious public health issue globally. Every year, thousands of lives are lost, and tens of thousands of individuals are injured. This situation adversely affects not only individuals but also the overall well-being of society. The increasing frequency of traffic accidents once again emphasizes the importance of a safe transportation environment.

Traffic accidents continue to be a pressing issue worldwide, posing significant threats to public safety, economic stability, and overall well-being. In Turkey, a country with a dynamic transportation landscape marked by rapid urbanization and increased vehicular traffic, understanding and addressing the factors contributing to traffic accidents is of paramount importance.

With the rapidly growing use of transportation vehicles and roads, the causes and effects of accidents have become more complex. Factors such as driver errors, infrastructure deficiencies, weather conditions, and traffic congestion increase the likelihood of accidents. This underscores the need for more efforts in developing safe transportation systems and addressing existing issues effectively.

AI Generated Image (generated by ChatGPT)

In this study, three different data sets were used and our goal is to examine the number of deaths and injuries in traffic accidents in Turkey between 2002 and 2022 according to age ranges. Also, We will share with you estimated number of road traffic death rate around the world and the current position of Turkey.

Our goal is :

  • Convert untidy data sets to tidy data sets.

  • Creating a different data set from the edited data sets.

  • Explaining data with graphics and tables.

Literature Review

Before conducting our analysis, we decided to examine previous studies about Turkey’s road traffic accidents, injuries and deaths. First and foremost, a study by Kaygisiz Et Al., (2017) considers a road traffic accident (RTA) to be the ones that include people or vehicles, and that happen on road. Earlier studies (Naci & Baker, 2008) reported that Turkey needed to focus on collecting data in an organized manner to understand various reasons related to the accidents, also suggesting that it could be a key element in reducing losses from them (Esiyok Et Al., 2005). Recent research (Erenler & Gumus, 2019) mentions that RTAs are one of the ten primary reasons of mortality worldwide, and even higher in developing countries. The study also provides the information that for persons between the ages 15 and 29, RTAs are the highest ranked reason for death. Many elements contribute to the causes of accidents, one of them being economic growth. While the study by Puvanachandra Et Al. (2012) claims that regions with lower income have higher RTA fatalities, the study of Erenler & Gumus (2019) suggests that developing countries experience more severe fatalities from RTAs. When the causes are investigated on an accident basis, another study (Sungur Et Al., 2014) that brings an epidemiological lookout on the case reports that less than 1% of the accidents are caused by non-human factors, while 95% of the time, the responsible party is the driver. It is also said that DUIs and exhaustion are of the most common reasons. On the economic impacts of RTAs, Naci & Baker (2008) suggest that in 2000, the deaths occurring on RTAs have impacted Turkey’s economy with a negative $2.6 Billion only by hindering productivity. Also, in the recent study of Ozturk (2022), child fatalities are also considered to have a considerable contribution to Turkey’s health burden. Thus, our study aims to use three data sets obtained from TURKSTAT and World Health Organization (WHO) to observe trends and provide comparisons of Turkey to other countries.

Data sets

Three different data sets were used in this project. Two of these data sets were taken from TURKSTAT and one from the WHO website.

First TURKSTAT Data Set contains following columns:

  • Accidents involving death and personal injury

  • Accidents involving material loss only

  • Total number of accidents

  • Year

Second TURKSTAT Data Set contains following columns:

  • Killed Persons

  • Injured Persons

  • Age Groups

  • Year

WHO Data Set contains following columns:

  • Countries

  • Estimated number of road traffic death rate

  • Year

TURKSTAT raw Data Sets:

You can view screenshots of the data sets in their original format below.

First Data Set

Second Data Set

TURKSTAT Data set Problems:

  • The spreadsheets starts and ends with a some text.

  • Column names are written separately in both English and Turkish.

  • Some columns are left blank for visual purposes.

  • The Age group data divided into two group in the same sheet.

  • Data is not in the long format.

WHO data set is tidy and available for study.

Data Collection and Pre processing

First, we will load all the libraries and define functions that will be needed throughout the study.

library(dplyr)
library(purrr)
library(readxl)
library(stringr)
library(janitor)
library(tidyverse)
library(rvest)    
library(gridExtra)
library(ggrepel)
library(directlabels)
library(DT)

read_clean <- function(..., sheet){
  read_excel(..., sheet = sheet)
}

options(scipen = 999999)

reorderFactors <- function(df, column = "my_column_name", 
                           desired_level_order = c("fac1", "fac2", "fac3")) {
  
  x = df[[column]]
  lvls_src = levels(x) 
  
  idxs_target <- vector(mode="numeric", length=0)
  for (target in desired_level_order) {
    idxs_target <- c(idxs_target, which(lvls_src == target))
  }
  
  x_new <- factor(x,levels(x)[idxs_target])
  
  df[[column]] <- x_new
  
  return(df)
}

Here, the scipen = 999999 option removes scientific notation of numbers and lets us create graphs with better axis break labels. The user-defined reorderFactors function is useful in ordering factors to our desired axis layout while plotting, as base R functions such as order sometimes fail to work within the ggplot2 library. To do this, the function stores the factor levels and orders them based on the desired index input.

After loading required libraries, we will download our first data from TURKSTAT website.

url = 'https://data.tuik.gov.tr/Bulten/DownloadIstatistikselTablo?p=8RC9RpGXOVWg1rPaE6MQ4FUE37S8S2vsiIJglnqCOrpJfrRCUPa5n3wsXEnCI0Xf'

raw_data = tempfile(fileext = ".xls")
download.file(url, raw_data,
              method = "auto",
              mode = "wb")

sheets <- excel_sheets(raw_data)

read_clean <- function(..., sheet){
  read_excel(..., sheet = sheet)
}

raw_data <- map(
  sheets,
  ~read_clean(raw_data,
              skip = 2,
              sheet = .)
) |>
  bind_rows()

head(raw_data,10)
# A tibble: 10 × 8
   ...1  `Toplam kaza` `Maddi hasarlı`    `Ölümlü, yaralanmalı` `Ölü sayısı (1)`
   <chr> <chr>         <chr>              <chr>                 <chr>           
 1 <NA>  sayısı        kaza sayısı        kaza sayısı           Killed persons …
 2 Yıl   Total number  Accidents involvi… Accidents involving … Toplam          
 3 Year  of accidents  material loss only and personal injury   Total           
 4 2002  439777        374029             65748                 4093            
 5 2003  455637        388606             67031                 3946            
 6 2004  537352        460344             77008                 4427            
 7 2005  620789        533516             87273                 4505            
 8 2006  728755        632627             96128                 4633            
 9 2007  825561        718567             106994                5007            
10 2008  950120        845908             104212                4236            
# ℹ 3 more variables: ...6 <chr>, ...7 <chr>, Yaralı <chr>

The data from TURKSTAT traditionally comes including the data name and translation in its first two rows. Thus, we are skipping those rows using skip = 2 in the read_clean function. Afterwards, taking a glimpse into the data, we can see that the column names and some rows need further processing.

data_1 = raw_data %>% select(1:4) %>% slice(-1)
data_1 = rbind(data_1, paste(data_1[1,],data_1[2,]))
data_2 = data_1 %>% slice(3:23,30)
data_2 = data_2 %>% row_to_names(22,remove_rows_above = FALSE)
colnames(data_2)[1] = 'Year'
data_accidents = data_2; rm(data_1); rm(data_2); rm(raw_data)
data_accidents = data_accidents %>% 
  mutate(across(where(is.character),as.numeric))

Since the second data will have common information with the first data, we decided to keep only the unique four columns of it with select(1:4), also since the first row was corrupted, it is removed using the slice(-1) command. Then, we combined the first and second rows of the new data to create proper column names. Lastly, we manually removed the Turkish translation of Year column, selected the rows that had data in them and fixed the column classes of the data before moving on to the second TURKSTAT data set.

url = 'https://data.tuik.gov.tr/Bulten/DownloadIstatistikselTablo?p=2/Sym42hc5kOF437mqcxligj8l5uHDGvvOSKXfSdmBmVHwkyus9lGDyc36ojWVBg'

raw_data = tempfile(fileext = ".xls")
download.file(url, raw_data,
              method = "auto",
              mode = "wb")

sheets <- excel_sheets(raw_data)


raw_data <- map(
  sheets,
  ~read_clean(raw_data,
              skip = 2,
              sheet = .)
) |>
  bind_rows()

head(raw_data,30)
# A tibble: 30 × 12
   ...1  Yaş grupları - Age gr…¹ ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10
   <chr> <chr>                   <chr> <lgl> <chr> <chr> <lgl> <chr> <chr> <lgl>
 1 <NA>  0 - 9                   <NA>  NA    10 -… <NA>  NA    15 -… <NA>  NA   
 2 Yıl   Ölü sayısı (1)          Yara… NA    Ölü … Yara… NA    Ölü … Yara… NA   
 3 Year  Killed persons (1)      Inju… NA    Kill… Inju… NA    Kill… Inju… NA   
 4 2002  322                     8788  NA    84    4524  NA    68    4572  NA   
 5 2003  178                     7149  NA    68    3992  NA    67    4533  NA   
 6 2004  225                     8148  NA    80    4642  NA    66    5152  NA   
 7 2005  179                     9077  NA    108   5988  NA    58    6095  NA   
 8 2006  178                     9237  NA    89    6133  NA    74    6673  NA   
 9 2007  179                     10333 NA    89    6790  NA    95    7337  NA   
10 2008  151                     9486  NA    80    6689  NA    65    6930  NA   
# ℹ 20 more rows
# ℹ abbreviated name: ¹​`Yaş grupları - Age groups`
# ℹ 2 more variables: ...11 <chr>, ...12 <chr>

In the second TURKSTAT data, the data reading process is repeated. When the first 30 rows of the data is observed, it can be seen that the data is split into two and merged vertically. To solve this problem, We split the data into two considering the last year ‘2022’ to be the last row for each part. Also, we removed that the data contained all NA columns for styling purposes in Excel that were unnecessary.

data_1 = raw_data %>% 
  slice(1:49) %>% 
  filter(row_number() <= min(which(...1 == 2022))) %>% 
  select(where(~!all(is.na(.)))) %>% 
  slice(-2)

head(data_1)
# A tibble: 6 × 9
  ...1  `Yaş grupları - Age groups` ...3     ...5  ...6  ...8  ...9  ...11 ...12
  <chr> <chr>                       <chr>    <chr> <chr> <chr> <chr> <chr> <chr>
1 <NA>  0 - 9                       <NA>     10 -… <NA>  15 -… <NA>  18 -… <NA> 
2 Year  Killed persons (1)          Injured… Kill… Inju… Kill… Inju… Kill… Inju…
3 2002  322                         8788     84    4524  68    4572  129   7143 
4 2003  178                         7149     68    3992  67    4533  120   6880 
5 2004  225                         8148     80    4642  66    5152  122   7984 
6 2005  179                         9077     108   5988  58    6095  140   8990 

Looking at the head of the data, we saw that the age groups are only labeled on the first columns they show up in. To fix that issue before merging the first rows and naming columns according to them, in a similar manner to the first TURKSTAT data, we filled NA values with last non NA value before them in that row.

for (i in 2:ncol(data_1)) {
  if (is.na(data_1[1, i])) {
    if (!is.na(data_1[1, i - 1])) {
      data_1[1, i] = data_1[1, i - 1]
    }
  }
}

data_1 = rbind(paste(data_1[1,],data_1[2,]),data_1)
data_1 = data_1 %>% 
  slice(-2,-3) %>% 
  row_to_names(1,remove_rows_above = F)
colnames(data_1)[1] = 'Year'

Since the data is in wide format, we used regex patterns and pivot_longer function to transform it into long format.

data_transformed_1 = data_1 %>%
  pivot_longer(cols = -Year, 
               names_to = c("Age_Group", "Killed_Or_Injured"), 
               names_pattern = "(\\d+\\s-\\s\\d+)\\s(\\w+)\\spersons") %>%
  mutate(Age_Group = gsub("-", "_", Age_Group)) %>%
  filter(!is.na(value))

The same process is repeated for the second part of the data. However, since age groups ‘65+’ and ‘Unknown’ do not fit the format of our selection regex, we decided to manually input the NA returns using loops for those values.

data_2 = raw_data %>% 
  slice(1:49) %>% 
  filter(row_number() > min(which(...1 == 2022))) %>% 
  select(where(~!all(is.na(.)))) %>% 
  slice(-1,-3)

for (i in 2:ncol(data_2)) {
  if (is.na(data_2[1, i])) {
    if (!is.na(data_2[1, i - 1])) {
      data_2[1, i] = data_2[1, i - 1]
    }
  }
}

data_2 = rbind(paste(data_2[1,],data_2[2,]),data_2)
data_2 = data_2 %>% 
  slice(-2,-3) %>% 
  row_to_names(1,remove_rows_above = F)
colnames(data_2)[1] = 'Year'

data_transformed_2 = data_2 %>%
  pivot_longer(cols = -Year, 
               names_to = c("Age_Group", "Killed_Or_Injured"), 
               names_pattern = "(\\d+\\s-\\s\\d+)\\s(\\w+)\\spersons") %>%
  mutate(Age_Group = gsub("-", "_", Age_Group)) %>%
  filter(!is.na(value))

for(i in 1:nrow(data_transformed_2)){
  if(is.na(data_transformed_2[i,"Age_Group"])){
    if(i%%2==1){
      data_transformed_2[i,"Age_Group"] = '65 +'
      data_transformed_2[i,"Killed_Or_Injured"] = 'Killed'
    }
    if(i%%2==0){
      data_transformed_2[i,"Age_Group"] = 'Unknown'
      data_transformed_2[i,"Killed_Or_Injured"] = 'Injured'
    }
  }
}

The two parts of the data are then merged.

data_fin = rbind(data_transformed_1,data_transformed_2)

data_fin = data_fin %>% 
  mutate(Year=as.numeric(Year),
         value = as.numeric(value))

data_accidents_full = data_fin %>% 
  left_join(data_accidents,by = 'Year')

rm(data_1,data_2,data_accidents,data_fin,data_transformed_1,data_transformed_2,raw_data)

Lastly, to observe Turkey’s position in accidents and death rates among European countries, the third data set from WHO website is downloaded and read into R.

#The link for the following data: https://www.who.int/data/gho/data/indicators/indicator-details/GHO/estimated-road-traffic-death-rate-(per-100-000-population)

who_data = read.csv('data.csv')
who_data_filtered = who_data %>% 
  select(Location,Period,Value) %>% 
  separate(Value, into = c('Value'), sep = ' '); rm(who_data)

who_data_filtered = who_data_filtered %>% 
  mutate(Period = as.factor(Period),
         Value = as.numeric(Value))

head(who_data_filtered)
                          Location Period Value
1              Antigua and Barbuda   2019  0.00
2 Micronesia (Federated States of)   2019  0.16
3                         Maldives   2019  1.63
4                         Kiribati   2019  1.92
5                            Egypt   2019 10.10
6                          Ukraine   2019 10.20

As seen in the code, no pre-processing apart from selecting useful columns and fixing their classes is needed for the WHO data as it is already tidy.

Data Visualization and Conclusions

In exploratory analysis, with the help of tidyverse and datatable packages, various pivot tables as well as plots can be created.

  • Yearly Number of Accidents in Turkey

To view the table of yearly accidents in Turkey, as well as the summary statistics, we will first create two data tables. For the first frequency table, simply selecting the desired columns and piping them into a datatable() function is satisfactory.

data_accidents_full %>% 
  select(Year,`Total number of accidents`,`Accidents involving material loss only`,`Accidents involving death and personal injury`) %>% 
  unique() %>% 
  datatable(class = "compact",
            caption = 'Yearly Number of Accidents\nin Turkey',
            options = list(pageLength = nrow(.)))

For the summary statistics table, we will first select the columns we want to work with, calculate their summary statistics, convert the output into a data frame and then place the data frame into the datatable() function.

data_accidents_full %>% 
  select(`Total number of accidents`,`Accidents involving material loss only`,`Accidents involving death and personal injury`) %>% 
  summary() %>% 
  as.data.frame() %>% 
  select(Var2, Freq) %>% 
  datatable(class = 'compact',
            options = list(pageLength = nrow(.)),
            colnames = c('Type','Sum. Stat.'),
            caption = 'Summary Statistics of Accidents in Turkey between 2002-2022')

We will also visualize the data to make it more accessible.

data_accidents_full %>% 
  select(Year,`Total number of accidents`,`Accidents involving material loss only`,`Accidents involving death and personal injury`) %>% 
  group_by(Year) %>% 
  mutate(Year = as.factor(Year)) %>% 
  unique() %>%
  gather('Accident','Count',-Year) %>% 
  filter(!Accident=='Total number of accidents') %>% 
  ggplot(.,aes(x = Year,y = Count, fill = Accident, group = Accident)) +
  geom_bar(stat = 'identity') + 
  geom_text(aes(label = format(after_stat(y), big.mark = ".", scientific = FALSE), group = Accident), fontface = 'bold',
            stat = 'summary', fun = sum, vjust = 0.5,hjust = 1.4, position = position_stack()) +
  theme_minimal() + 
  labs(x = 'Year', y = 'Count', title = 'Accidents per Year in Turkey', subtitle = '2002-2022') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'right',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4')) + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  coord_flip() + 
  scale_fill_manual(values =c('#ea7286','#a9c484'))

The plot above shows the Yearly Deaths and Injuries and Material Loss only Number in Turkey. Between 2002 and 2012 total number of accident count increasing each year. In 2012, Accident involving material loss only reach the peak point. Also, in 2022 Accident involving death and personal injury reach the peak point. The frequencies of deaths and injuries for every age group can be seen in the table below on a yearly manner.

data_accidents_full %>% 
  select(Year,Age_Group,Killed_Or_Injured,value) %>% 
  pivot_wider(names_from = c("Killed_Or_Injured", "Age_Group"),values_from = "value") %>% 
  rename_with(~ gsub("Killed_", "Killed Age: ", .), starts_with("Killed_")) %>%
  rename_with(~ gsub("Injured_", "Injured Age: ", .), starts_with("Injured_")) %>% 
  datatable(class = "compact",
            caption = 'Yearly Killed or Injured Persons by Age Groups\nin Turkey',
            options = list(pageLength = nrow(.)))

We also decided to go into detail about the percentage of deaths and injuries in accidents involving death/personal injury. To do so, it is enough to use mutate function to create killed and injured percentages before gathering the data for ggplot.

data_accidents_full %>%
  group_by(Year, Killed_Or_Injured) %>%
  mutate(Year = as.factor(Year)) %>% 
  summarise(total_count = sum(value,na.rm = T)) %>%
  spread(Killed_Or_Injured, total_count, fill = 0) %>%
  mutate(Total = Killed + Injured,
         Killed_percentage = (Killed / Total) * 100,
         Injured_percentage = (Injured / Total) * 100) %>%
  select(Year, Killed_percentage, Injured_percentage) %>% 
  gather('Type','Percentage',-Year) %>% 
  ggplot(.,aes(x = Year,y = `Percentage`, fill = Type)) +
  geom_bar(stat = 'identity') + 
  geom_text(aes(label = paste0('% ', round(Percentage,2))),fontface = 'bold',size = 5,hjust = 0.3) + 
  theme_minimal() + 
  labs(x = 'Year', y = 'Percentage', title = 'Death/Injury Percentages per Year in Turkey', subtitle = '2002-2022') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'right',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4')) + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  coord_flip() + 
  scale_fill_manual(values = c('#eab281','#ea7286'))

In this chart, we showed the deaths and injuries as percentages by year. Each year, between one and three percent of people involved in traffic accidents result in death. To detail our findings further, we decided to create line-graphs that investigate trends on deaths/injuries per age group, as well as overall deaths/injuries yearly. Here, we filtered the 25-64 age group to create a separate plot from the others as the magnitude of their statistics vastly out-range other groups. Then, using the grid_arrange function from the gridExtra library, we merge the plots into a single output.

pa1 = data_accidents_full %>% 
  select(Age_Group,Year,value) %>% 
  unique() %>%
  filter(Year %in% c(2002:2020)) %>%
  filter(!Age_Group == '25 _ 64') %>% 
  mutate(Year = as.factor(Year)) %>% 
  group_by(Age_Group,Year) %>% 
  summarise(value = sum(value,na.rm = T)) %>% 
  mutate(label = if_else(Year == 2020, as.character(Age_Group), NA_character_)) %>%
  ggplot(.,aes(x = Year, y = value, group = Age_Group, color = Age_Group)) + 
  geom_line(size = 1) + 
  geom_point(size = 2.2) +
  theme_minimal() + 
  geom_label_repel(aes(label = label),
                   nudge_x = 1,
                   na.rm = TRUE,
                   fontface = 'bold') +
  scale_color_manual(values = c('#eab281','#e3e19f','#a9c484','#5d937b','#58525a','#a07ca7','#f4a4bf'))+
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  labs(x = 'Year', y = 'Persons', title = 'Persons Killed/Injured in Accidents', subtitle = '2002-2019, All Age Groups\nNot Including 25-64') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4'))


pa2 = data_accidents_full %>% 
  select(Age_Group,Year,value) %>% 
  unique() %>%
  filter(Year %in% c(2002:2020)) %>% 
  filter(Age_Group == '25 _ 64') %>% 
  mutate(Year = as.factor(Year)) %>% 
  group_by(Age_Group,Year) %>% 
  summarise(value = sum(value,na.rm = T)) %>% 
  ggplot(.,aes(x = Year, y = value, group = 1, color = '#eab281')) + 
  geom_line(size = 1) + 
  geom_point(size = 2.2) +
  theme_minimal() + 
  scale_color_manual(values = c('#ea7286')) +
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  labs(x = 'Year', y = 'Persons', title = 'Persons Killed/Injured in Accidents', subtitle = '2002-2019, Ages 25-64') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4'))


grid.arrange(pa1,pa2,ncol = 1)

As you can see, each age group have similar trend except 65+ age group. In the first graph, 21-24 age group has the highest number of death or injury and the lowest group 65+ age group.

Similarly, in two line plots, death and injuries in Turkey are drawn to visualize the trend for the country overall.

p1 = data_accidents_full %>% 
  filter(Year %in% c(2002:2020)) %>% 
  mutate(Year = as.factor(Year)) %>% 
  filter(Killed_Or_Injured == 'Killed') %>% 
  select(Year,`Accidents involving death and personal injury`,Killed_Or_Injured,value) %>% 
  group_by(Year,Killed_Or_Injured) %>% 
  summarise(value = sum(value)) %>% 
  ggplot(.,aes(x = Year, y = value, group = Killed_Or_Injured, color = Killed_Or_Injured)) + 
  geom_line(size = 1) + 
  geom_point(size = 2.2) +
  geom_text(aes(label = format(after_stat(y), big.mark = ".", scientific = FALSE), fontface = 'bold'),
            stat = 'summary', fun = sum, vjust = 0.5,hjust = -0.4, color = 'black') + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  theme_minimal() + 
  labs(x = 'Year', y = 'Persons', title = 'Persons Killed in Accidents', subtitle = '2002-2019') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4'))


p2 = data_accidents_full %>% 
  filter(Year %in% c(2002:2020)) %>% 
  mutate(Year = as.factor(Year)) %>% 
  filter(Killed_Or_Injured == 'Injured') %>% 
  select(Year,`Accidents involving death and personal injury`,Killed_Or_Injured,value) %>% 
  group_by(Year,Killed_Or_Injured) %>% 
  summarise(value = sum(value)) %>% 
  ggplot(.,aes(x = Year, y = value, group = Killed_Or_Injured, color = Killed_Or_Injured)) + 
  geom_line(size = 1) + 
  geom_point(size = 2.2) +
  geom_text(aes(label = format(after_stat(y), big.mark = ".", scientific = FALSE), fontface = 'bold'),
            stat = 'summary', fun = sum, vjust = 0.5,hjust = -0.4, color = 'black') + 
  theme_minimal() + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  labs(x = 'Year', y = 'Persons', title = 'Persons Injured in Accidents', subtitle = '2002-2019') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4'))
grid.arrange(p1,p2,ncol = 1)

  • Yearly Death Rates of Countries Comparison

In this graph, we see separately the number of people death or injured in traffic accidents. We see no stable trend in both graphs. The number of deaths doubled in 2015 compared to the previous year. The reason for this is explained in the text parts of the data that we deleted.

Until year 2015, figures on persons killed include the deaths only at the accident area however since year 2015 figures on persons killed also include the deaths within 30 days after the traffic accidents due to related accident and its impacts for people injured and sent to health facilities. Lastly, using the data we obtained from the WHO, we will observe how Turkey’s death rate on accidents fares against selected countries. First, we will observe yearly death rates per 100.000 people in Turkey.

who_data_filtered %>% 
  filter(Location=='Türkiye') %>% 
  ggplot(.,aes(x = Period,y = `Value`, fill = '#ea7286')) +
  geom_bar(stat = 'identity') + 
  geom_text(aes(label = Value),fontface = 'bold',size = 5,hjust = -0.3) + 
  theme_minimal() + 
  labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Road Traffic Death Rates per 100.000 People in Turkey', subtitle = 'Estimated, 2000-2019') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4')) + 
  coord_flip() + 
  scale_fill_manual(values = c('#eab281'))

In the first plot, we can see that while the rates were mostly declining to between 6-7 since 2000, the death rate in Turkey has peaked in 2011, almost doubling the year before. Afterwards, it took a steady decline until 2019, where it seems to have returned to pre-2011 death rates.

Secondly, we can see how Turkey fares against G7 countries in road traffic death rates

who_data_filtered %>% 
  filter(Location=='Türkiye' | Location=='Canada' | Location=='France' | Location=='Germany' | Location =='Italy' | Location == 'Japan'
         | Location == 'United Kingdom of Great Britain and Northern Ireland'
         | Location == 'United States of America') %>% 
  mutate(label = if_else(Period == '2019', as.character(Location), NA_character_)) %>%
  ggplot(.,aes(x = Period, y = Value, group = Location, color = Location)) + 
  geom_line(size = 1.2) + 
  geom_point(size = 2.2) +
  theme_minimal() + 
  geom_label_repel(aes(label = label),
                   nudge_x = 1,
                   na.rm = TRUE,
                   fontface = 'bold') + 
  scale_color_manual(values = c('#ea7286','#eab281','#e3e19f','#a9c484','#5d937b','#58525a','#a07ca7','#f4a4bf'))+
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Death Rates per 100.000 People', subtitle = '2000-2019, Turkey & G7 Countries') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4'))

When we compare Turkey with G7 countries, we can see that while Turkey had similar death rates with European countries and Japan until 2011, the sharp spike in 2011 separated us from the remaining countries, almost reaching the level USA. It can also be seen that the USA always had higher death rates compared to European G7 countries, Turkey, and Japan. To see the trend in Turkey more clearly, we can also highlight Turkey from the rest of the countries.

who_data_filtered %>% 
  filter(Location=='Türkiye' | Location=='Canada' | Location=='France' | Location=='Germany' | Location =='Italy' | Location == 'Japan'
         | Location == 'United Kingdom of Great Britain and Northern Ireland'
         | Location == 'United States of America') %>% 
  mutate(label = if_else(Period == '2019', as.character(Location), NA_character_)) %>%
  ggplot(.,aes(x = Period, y = Value, group = Location, color = Location)) + 
  geom_line(size = 1.2) + 
  geom_point(size = 2.2) +
  theme_minimal() + 
  geom_label_repel(aes(label = label),
                   nudge_x = 1,
                   na.rm = TRUE,
                   fontface = 'bold') + 
  scale_color_manual(values = c('grey','grey','grey','grey','grey','darkred','grey','grey'))+
  scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
  labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Death Rates per 100.000 People', subtitle = '2000-2019, Turkey & G7 Countries') +
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4'))

We will also look for the 20-year averages for those countries, as well as the country with the highest and lowest average. First, we will filter the data and order the countries in a decreasing manner.

who_data_filtered_plot = who_data_filtered %>% 
  group_by(Location) %>% 
  summarise(Value = mean(Value)) %>% 
  filter(Location=='Türkiye' | Location=='Canada' | Location=='France' | Location=='Germany' | Location =='Italy' | Location == 'Japan'
         | Location == 'United Kingdom of Great Britain and Northern Ireland'
         | Location == 'United States of America' | Value == min(Value) | Value == max(Value)) 
who_data_filtered_plot$Location = as.factor(who_data_filtered_plot$Location)
who_data_filtered_plot[order(who_data_filtered_plot$Value,decreasing = T),]
# A tibble: 10 × 2
   Location                                             Value
   <fct>                                                <dbl>
 1 Thailand                                             36.8 
 2 United States of America                             13.6 
 3 Italy                                                 8.58
 4 Türkiye                                               8.11
 5 France                                                7.67
 6 Canada                                                7.49
 7 Japan                                                 6.74
 8 Germany                                               5.90
 9 United Kingdom of Great Britain and Northern Ireland  4.75
10 Maldives                                              2.16

For extra safety, we will use the pre-defined reorderFactors function to order their levels, allowing us to use the data in ggplot as is.

who_data_filtered_plot = reorderFactors(who_data_filtered_plot,'Location',c('Maldives','United Kingdom of Great Britain and Northern Ireland','Germany',
                                                                            'Japan','Canada','France','Türkiye','Italy','United States of America','Thailand'))

Then, we can create the plot and highlight the countries we want using ggplot.

who_data_filtered_plot %>% 
  ggplot(.,aes(x = Location,y = `Value`, fill = Location)) +
  geom_bar(stat = 'identity') + 
  geom_text(aes(label = Value),fontface = 'bold',size = 5,hjust = -0.3) + 
  theme_minimal() + 
  labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Death Rates per 100.000 People in Turkey', subtitle = '2000-2019') +
  scale_y_continuous(limits = c(0,45)) + 
  theme(axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.x = element_text(size = 13, face = 'bold'),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.title.y = element_text(size = 13, face = 'bold'),
        legend.position = 'none',
        title = element_text(size = 14, face = 'bold'),
        plot.subtitle = element_text(size = 13, face = 'italic'),
        plot.background = element_rect(fill = '#F4F4F4'),
        panel.background = element_rect(fill = '#F4F4F4'),
        strip.background = element_rect(fill = '#F4F4F4')) + 
  coord_flip() + 
  scale_fill_manual(values = c('#a9c484','grey','grey','grey','grey','grey','#eab281','grey','grey','darkred'))

By looking at the last plot, we can see that on average, road accident death rates of Turkey for the 20 year span has been close to European countries. The country with the lowest death rate, Maldives, has a death rate of almost one fourth of Turkey on average while the country with the highest death rate, Thailand, has a death rate higher than four times of the average of Turkey.

References

Erenler, A. K., & Gumus, B. (2019). Analysis of Road Traffic Accidents in Turkey between 2013 and 2017. Medicina, 55(10), 679. doi:10.3390/medicina55100679

Esiyok, B., Korkusuz, I., Canturk, G., Alkan, H. A., Karaman, A. G., & Hamit Hanci, I. (2005). Road traffic accidents and disability: A cross-section study from Turkey. Disability and Rehabilitation, 27(21), 1333–1338. doi:10.1080/09638280500164867

Kaygisiz, O., Senbil, M., & Yildiz, A. (2017). Influence of urban built environment on traffic accidents: The case of Eskisehir (Turkey). Case Studies on Transport Policy, 5(2), 306–313. doi:10.1016/j.cstp.2017.02.002

Naci, H., & Baker, T. D. (2008). Productivity losses from road traffic deaths in Turkey. International Journal of Injury Control and Safety Promotion, 15(1), 19–24. doi:10.1080/17457300701847648

Ozturk E. A. (2022). Burden of deaths from road traffic injuries in children aged 0-14 years in Turkey. Eastern Mediterranean health journal = La revue de sante de la Mediterranee orientale = al-Majallah al-sihhiyah li-sharq al-mutawassit, 28(4), 272–280. https://doi.org/10.26719/emhj.22.013

Puvanachandra, P., Hoe, C., Ozkan, T., & Lajunen, T. (2012). Burden of Road Traffic Injuries in Turkey. Traffic Injury Prevention, 13(sup1), 64–75. doi:10.1080/15389588.2011.633135

Sungur, I., Akdur, R., & Piyal, B. (2014). Analysis of Traffic Accidents in Turkey. Ankara Medical Journal, 14(3). https://doi.org/10.17098/amj.65427